Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm
نویسندگان
چکیده
Selecting the right word translation among several op tions in the lexicon is a core problem for machine trans lation We present a novel approach to this problem that can be trained using only unrelated monolingual corpora and a lexicon By estimating word translation probabilities using the EM algorithm we extend upon target language modeling We construct a word trans lation model for German and English noun tokens with very promising results
منابع مشابه
Translation Lexicon Estimates from Non-Parallel Corpora Pairs
The estimation of translation lexicon probabilities from parallel corpora is well studied in statistical machine translation. Whenever parallel corpora are not available, it is still possible to obtain unsupervised estimates from pairs of monolingual, non-parallel corpora. In both cases the standard estimator is the Expectation-Maximization (EM) that aims at increasing the likelihood of the sou...
متن کاملEstimating Word Translation Probabilities for Thai – English Machine Translation using EM Algorithm
Selecting the word translation from a set of target language words, one that conveys the correct sense of source word and makes more fluent target language output, is one of core problems in machine translation. In this paper we compare the 3 methods of estimating word translation probabilities for selecting the translation word in Thai – English Machine Translation. The 3 methods are (1) Metho...
متن کاملCollocation Translation Acquisition Using Monolingual Corpora
Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency ...
متن کاملLearning A Translation Lexicon From Monolingual Corpora
This paper presents work on the task of constructing a word-level translation lexicon purely from unrelated monolingual corpora. We combine various clues such as cognates, similar context, preservation of word similarity, and word frequency. Experimental results for the construction of a German-English noun lexicon are reported. Noun translation accuracy of 39% scored against a parallel test co...
متن کاملCollocation Extraction Using Monolingual Word Alignment Method
Statistical bilingual word alignment has been well studied in the context of machine translation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual corpus. The monolingual corpus is first replicated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. Then the mon...
متن کامل